Opinion Mining of Spanish Customer Comments with Non-Expert Annotations on Mechanical Turk
نویسندگان
چکیده
One of the major bottlenecks in the development of data-driven AI Systems is the cost of reliable human annotations. The recent advent of several crowdsourcing platforms such as Amazon’s Mechanical Turk, allowing requesters the access to affordable and rapid results of a global workforce, greatly facilitates the creation of massive training data. Most of the available studies on the effectiveness of crowdsourcing report on English data. We use Mechanical Turk annotations to train an Opinion Mining System to classify Spanish consumer comments. We design three different Human Intelligence Task (HIT) strategies and report high inter-annotator agreement between non-experts and expert annotators. We evaluate the advantages/drawbacks of each HIT design and show that, in our case, the use of non-expert annotations is a viable and costeffective alternative to expert annotations.
منابع مشابه
Cheap and Fast - But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks
Human linguistic annotation is crucial for many natural language processing tasks but can be expensive and time-consuming. We explore the use of Amazon’s Mechanical Turk system, a significantly cheaper and faster method for collecting annotations from a broad base of paid non-expert contributors over the Web. We investigate five tasks: affect recognition, word similarity, recognizing textual en...
متن کاملThe NewSoMe Corpus: A Unifying Opinion Annotation Framework across Genres and in Multiple Languages
We present the NewSoMe (News and Social Media) Corpus, a set of subcorpora with annotations on opinion expressions across genres (news reports, blogs, product reviews and tweets) and covering multiple languages (English, Spanish, Catalan and Portuguese). NewSoMe is the result of an effort to increase the opinion corpus resources available in languages other than English, and to build a unifying...
متن کاملCreating a Bi-lingual Entailment Corpus through Translations with Mechanical Turk: $100 for a 10-day Rush
This paper reports on experiments in the creation of a bi-lingual Textual Entailment corpus, using non-experts’ workforce under strict cost and time limitations ($100, 10 days). To this aim workers have been hired for translation and validation tasks, through the CrowdFlower channel to Amazon Mechanical Turk. As a result, an accurate and reliable corpus of 426 English/Spanish entailment pairs h...
متن کاملAnnotating Large Email Datasets for Named Entity Recognition with Mechanical Turk
Amazon's Mechanical Turk service has been successfully applied to many natural language processing tasks. However, the task of named entity recognition presents unique challenges. In a large annotation task involving over 20,000 emails, we demonstrate that a compet itive bonus system and interannotator agree ment can be used to improve the quality of named entity annotations from Mechanical ...
متن کاملCreating a linguistic plausibility dataset with non-expert annotators
We describe the creation of a linguistic plausibility dataset that contains annotated examples of language judged to be linguistically plausible, implausible, and every-thing in between. To create the dataset we randomly generate sentences and have them annotated by crowd sourcing over the Amazon Mechanical Turk. Obtaining inter-annotator agreement is a difficult problem because linguistic plau...
متن کامل